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ABSTRACT 

By examining the relations among sections of the 
fundamental frequency contour used in judging an utterance as a 
question or statement, the experiment described in this report seeks 
a more detailed understanding of auditory-linguistic interaction in 
the perception of intonation contours. The perceptual process may be 
divided into stages (auditory, phonetic, phonological, syntactic, and 
semantic) , but it must also be supposed that there is feedback from 
higher to lower levels which may serve to correct or verify earlier 
decisions. Perceptual "correction’' of an auditory or phonetic 
decision, in light of a higher linguistic decision, will presumably 
not occur if the lower decision is firm. Details of the organization 
and results of an experiment (conducted with Swedish and American 
subjects) are presented here along with implications for further 
research. (Author/VM) 
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Auditory and Linguistic Processes in the Perception of Intonation Contours 

^ ' 1 *'^ 

Michael Studdert-Kennedy and Kerstin Hadding 

Haskins Laboratories, New Haven 



ABSTRACT 



The fundamental frequency contour of a 700-msec vocoded utter- 
ance, "November” [no'v£mb»^], was systematically varied to produce 
72 contours, different in f(j at the stress and over the terminal 
glide. The contours were recorded (1) carried on the speech wave, 

(2) as modulated sine waves. Swedish and American subjects class- 
ified (1) both speech and sine-wave contours as either terminally 
rising or terminally falling (psychophysical judgments) , (2) speech 
contours as questions or statements (linguistic judgments). For 
both groups, two factors acted In complementary relation to govern 
linguistic judgments: perceived terminal glide and fo at the stress. 

Listeners tended to classify contours with an apparent terminal rise 
and/or high stress as questions, contours with an apparent terminal 
fall and/or low stress as statements. For both speech and sine waves 
psychophysical judgments of terminal glide were Influenced by earlier 
sections of the contour, but the effects were reduced for sine-wave 
contours, and there were several Instances in which speech psycho- 
physical judgments followed the linguistic more closely than the 
sine-wave judgments. It is suggested that these Instances may re- 
flect the control exerted by linguistic decision over perceived 
auditory shape. 

The perception of spoken language may be conceived as a process conducted 
at several successive and simultaneous levels. Auditory, phonetic, phonologi- 
cal, syntactic, and semantic processes form a hierarchy, but decisions from 
higher levels also feed back to correct or verify tentative decisions at lower 
levels and to construct the final percept. Suitable experiments (e.g., Warren, 
1970) may demonstrate the control exercised by higher on lower level deci- 
sions, and the partial determination of phonetic shape by phonological and 
syntactic rules is readily assumed by some linguists (e.g., Chomsky and Halle, 
1968, p. 24). However, the auditory level. Itself a complex of interactive 
processes by which an acoustic signal Is converted into a representation suit- 
able for input to the phonetic component (Fourcln, in press), is commonly 
taken to be relatively Independent. 

A few studies have questioned this assumption. Ladefoged and McKinney 
(1963), for example, showed that judgments of the loudness of words presented 
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In a carrier sentence may be more closely related to the work done upon them 
In phonatlon, that Is^ to their degree of stress, than to their acoustic In- 
tensity. Allen (1971), replicating and extending the experiment, showed that 
both acoustic level and Inferred vocal effort may serve as cues for the loud- 
ness of speech, and that Individuals differ In the weight they assign to 
these cues. Evidently, loudness Judgment of speech may entail a relatively 
complex process of Inference, drawing upon more than one level of analysis. 
The same may be trje of pitch Judgment: Haddlng-Koch and Studdert-Kennedy 

(1963, 1964, 1965) found that auditory Judgments of listeners, asked to 
assess fundamental frequency (fg) contours Imposed synthetically on a carrier 
word, seemed to be Influenced by linguistic decisions. The present experi- 
ment extends this earlier work and, by examining the relations among sections 
of the fg contour used In Judging an utterance as a question or statement, 
attempts a more detailed understanding of auditory— linguistic interaction In 
the perception of Intonation countours. ^ 

The starting point for the study Is the Importance commonly attributed 
to the terminal glide as an acoustic cue for Judgment of* an utterance as a 
question or statement. Two related sets of questions present themselves. 

The first concerns the basis for auditory Judgments of the glide. From our 
earlier study (Haddlng-Koch and Studdert-Kennedy, 1963, 1964, 1965) It was 
evident that listeners frequently Judge a falling glide as rising and a ris- 
ing glide as falling. Is the o7.*lgln of this effect auditory (psychophysical) 
or linguistic? Our study left the question unanswered. There, we system- 
atically manipulated the contour of an utterance by varying fg at the stress 
peak, at the "turning point" before the terminal glide, and at the end point 
We then asked listeners to classify each contour as (1) question or statement 
(lir.gulst.lc Judgment), (2) having a terminal rise or fall (psychophysical 
Judgment). The two tasks yielded remarkably similar results: whether Judg- 

ing the entire contour linguistically or Its terminal glide psythophyslcally, 
listeners were Influenced In similar ways by the overall pattern of the con- 
tour. The outcome suggested that auditory Judgments miy have been controlled. 
In part, by linguistic Judgments. But the reverse Interpretation — that lin- 
guistic Judgments of the entire contour were controlled by auditory Judgments 
of the 'cTermlnal gllde--ls equally plausible as long as we do not know the 
auditory capacity of listeners for Judging the terminal glides of matched non- 
speech contours. The present study attempts to resolve this ambiguity by In- 
cluding the necessary nonspeech Judgments. Effects observed only In the two 
types of speech Judgment would then be compatible with the first Interpreta- 
tion, while effects observed In all three types of Judgment would be compat- 
ible with the second. 

At the same time, this study broaches a second, related set of questions. 
These concern the roles of the various sections of the contour In determining 



The acoustic correlates of Intonation are said to be changes In one or more 
of three variables: fundamental frequency. Intensity, and duration, with 

variations In fundamental frequency over time being the strongest single cue 
(Bollnger, 1958; Denes, 1959; Fry, 1968; Lehlste, 1970; Lleberman, In press). 
The present study Is concerned with only one of these variables, fundamental 
frequency, and the term "Intonation contour" refers exclusively to contours 
of fundamental frequency. 
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linguistic judgments. Previous studies, both naturalistic and experimental, 
have suggested that listeners make use of an entire contour, not simply of 
the terminal glide, in Judging an utterance (see Girding and Abramson, 1965; 
Hadding-Koch, 1961; Hadding-Koch and Studdert-Kennedy, 1963, 1964, 1965). 

For example, spectrographlc analyses of Swedish speech have shown that, in 
this language, "yes-no" questions normally display not only a terminal rise, 
but also an overall higher fo than statements (Hadding-Koch, 1961). Other 
utterances in which the speaker wants to draw the listener's special atten- 
tion also display an overall high f^, and a terminal rise: in listening 

tests the labels "question," "surprise," "interest" have been found to be 
Interchangeable (Hadding-Koch, 1961, pp. 126 ff.). If a speaker is not in- 
terested or is asking a question to which, he thinks he knows the answer,^ 
his utterances tend to display a lower overall f^ and a falling terminal 
glide, similar to those of statements. 

The Importance of the entire contour may be reflected in the phonetic 
description. If four fo levels are postulated, with arrows showing the di- 
rection of the terminal glide, the intonation contour of a typical Swedish 
"yes-no" question could be described with one number at the beginning of 
the utterance and two at the stress,^ as 3 44 (the superscript 3 indi- 

cates the end point of the terminal glide) or, if less "Interested," as 
2 33 2t . A neutral statement would be best described as 2 33 14», or even 
2 22 I'll, though the latter might also Indicate a certain indifference. Much 
the same statement contour is typical of American English. However, questions 
in this language are said to display a more or less continuously rising con- 
tour (Pike, 1945;. Hockett, 1955) which might be described as 2 22 or 
2 33 3^ . Similar contours occur in Swedish echo-questions.^ 

These naturalistic observations of speech are, in general, consistent 
with results of our experimental study of perception (Hadding-Koch and 
Studdert-Kennedy, 1963, 1964, 1965). Swedish listeners selected a typical 
Swedish question (2 44 2^) among their preferred question contours, and a 
lower contour with a level terminal glide (2 33 1^) among their preferred 
statements (they would probably have preferred 2 33 1+ for a statement had 
this contour been Included). The North American listeners also preferred 
2 44 2f for a question and 2 33 1> for a statement, but they were more 



‘Many workers who have reported, for various languages, that the same intona- 
tion is used in questions as in statements, seem to have been anxious to ex- 
clude all emotional "overtones" and therefore told their subjects to speak 
in a neutral voice. The result is that, in the absence of grammatical Q- 
markers, utterances sound like statements. A "neutral" intonation is not 
enough to convey, as sole cue, the Impression of a question. If a question 
is asked merely for form's sake, with no particular Interest in the answer, 
no difference in Intonation is to be expected from that of a statement. 

We write two numerals at the stress and one at the turning point, even though 
they may be on the same "level" (intonation level, f_ level), cf. Delattre, 
1963; Hockett, 1955. 

Compare the similar difference in intonation contours for French suggested 
by Lion, in presrn. 



155 



3 



iPM wK/Mtf 



uncertain (In less agreement with one another) than the Swedish listeners-- 
perhaps because the contours were based on Swedish speech and did not In- 
clude, for example, a typical American English question. 

Granted, then, the Importance of the entire contour, we may now ask 
how Its various sections work together to control linguistic judgment. Here, 
let us recall a central finding of our previous study, namely that there was 
perceptual reciprocity among various sections of a contour: listeners would 
trade a high f^ at one point In the utterance for a high f^ elsewhere. For 
example, an utterance with a relatively high at peak or turning point re- 
quired a smaller terminal rise to be heard as a question than an utterance 
with relatively low f at peak or turning point. We may Interpret this re- 
ciprocity In either of two ways. The first Interpretation assigns only audi- 
tory status to peak and turning point and assumes their linguistic role to be 
Indirect . Thus, an utterance Is marked as question or statement by Its apparent 
terminal glide. Earlier sections of the contour are Important only Insofar as 
they alter (by some mechanism to be specified) listeners' perceptions of that 
glide and thereby give rise to the observed reciprocity effects. Lleberman's 
(1967) account of our results rests squarely on these assumptions. He selects 
an analysls-by-synthesls" mechanism to account for the reciprocity. 

An alternative Interpretation assigns a direct linguistic function to peak 
and turning point. An utterance Is marked as question or statement not only by 
Its terminal glide, but also by the f^ pattern over Its earlier course. Listen- 
ers discover at least two acoustic cues within a contour, either or both of 
which may control their linguistic decision. The weighting of these cues (by 
some unknown mechanism) gives rise to the reciprocity observed In linguistic 
judgments. 

A second purpose of this study was to distinguish between these accounts, 
again by extending our earlier work to Include judgments of the terminal glides 
of matched nonspeech contours. Effects present In all three types of judgment 
would then require the first Interpretation but would exclude an account, such 
as that of Lleberman (1967), that Invoked specialized speech mechanisms. Ef- 
fects present only In the two types of speech judgment would be compatible with 
both the first Interpretation and Lleberman's hypothesized mechanism. Effects 
present only in the linguistic judgments would require the second Interpretation. 

Finally, an additional purpose of the study was to extend our cross-lln- 
gulstlc comparison of Swedish and American English listeners. We therefore en- 
larged the set of contours to Include typical queotlons and statements from 
both American English and Swedish. 



METHOD 

The stimuli were prepared by means of the Haskins Laboratories Digital 
Spectrum Manipulator (DSM) (Cooper, 1965). This device provides a spectro— 
graphic display vof a 19-chav. ',el vocoder analysis, digitized to 6 bits at 10- 
msec Intervals, and permits the experimenter to vary the contents of each cell 
In the frequency-time matrix, before resynthesls by the vocoder. For the pre- 
sent study we were Interested In the channel that displayed the time course of 
the fundamental frequency of the utterance, since it was by manipulating the 
contents of this channel that we varied f^* 



Th6 utt6ranc6 'NovembGr" [no'v£mbariI was spokan by an Amatlcan mala volca 
Into tha vocoder and stored in the DSMf was then manipulated over a range 
from 85 cps to 220 cps. The values at the most Important points of the 
contours (starting point, peak, turning point, and end point) were chosen to 
represent four different f^ levels of a speaker with a range from 65 cps to 
250 cps. The four levels were based on a previous analysis of a long sample 

of speech by a speaker with this particular range (Hadd Ing-Koch, 1961. p. 

110 ff.).5 



The contours are schematized in Figure 1. They range between two poles 
that may be marked 2 44 3^4 and 2 11 !•>. All contours start on a Fq of 130 
Hz (level 2), sustained for 170 msec, over the first syllable. They then 
move, during 106 msec, to one of three peaks; 130 Hz (L, or low, level 2), 
160 Hz (H, or high, level 3), 200 Ha, (S, or superhlgh, level 4). They pro- 
ceed, during 127 msec, to one of four turning points: 100 Hz (high level 1), 
120 Hz (level 2), 145 Hz (low level 3), 180 Hz (high level 3). Finally, 
they proceed, during 201 msec, to one of six end-points: 85 Hz (level 1), 

100 Hz (high level 1), 120 Hz, 145 Hz, 180 Hz, and 220 Hz (level 4). 
Peak, turning point, and end point are each sustained for 32 msec. The com- 
bination of three peaks, four turning points, and six end points yields 72 
contours, each specified by a letter and two numbers (e.g., S24, L36) and 
each lasting 700 msec. 



The 72 contours were recorded on magnetic tape from the output of the 
vocoder in three forms: (1) carried on a speech wave [no'vcmbi^], (2) as a 

frequency-modulated sine wave, (3) as a frequency-modulated train of pulses. 
Each set of 72 was spliced into five different random orders with a five- 
second interval between stimuli and a ten— second pause after every tenth 
stimulus. They were presented to Swedish and U.S. subjects as described be- 
low. 

Swedish Subjects . Twenty-two graduate and undergraduate volunteers were 
tested in three sessions, each lasting about 45 minutes. They listened to the 
tests over a loud speaker at a comfortable listening level in a quite room. 

In a given session they heard the five test orders for one type of stimulus 



One of the contentions of that study, based on a number of utterances in 
continuous speech by several Swedish subjects, was that every speaker has, 
in addition to a general speaking range, clusters of "favorite pitches" 
which he uses, for Instance, on stressed segments of statements (represen- 
ted by the H-peak In the present study), and a higher level which he uses 
for questions and various expressions of "interest" (here represented by 
level 4; cf. also Bolinger, 1964). 

Statements were found in that study to end on a low level,, hesitant or 
exclamatory utterances, higher up. Questions tended to have a terminal rise, 
usually from level 2, or a fall ending comparatively high. Questions were 
also generally spoken with an overall high fo compared to statements, a phe- 
nomenon that, according to the literature, occurs in many languages (Hermann, 
1942; Bolinger, 1964). The contour then often started high. Polite or 
friendly statements too might end with a final rise, but from a comparatlve.ly 
low level and with a moderate range (cf. Uldall, 1962). 
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Schema of Fundamental Frequency Contours 
Imposed on the Utterance "November” [noVembar] 
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only. They were divided into two groups of 11. Both groups heard the sine- 
wave stimuli first; this was an Important precaution Intended to exclude any 
possible Influence of speech mechanisms on judgments of the nonspeech stimuli. 
In the second and third sessions both groups made psychophysical or linguistic 
judgments on the speech stimuli, group 1 In the order psychophysical-linguls- 
tlc, group 2 in the reverse order. In the sine— wave session and in the psy- 
chophysical speech session, subjects were asked to listen to the final glide 
of each contour and judge whether it was rising or falling. In the linguis- 
tic speech session subjects were asked to judge each contour as more like a 
question or more like a statement. For each contour, the procedure yielded 
5 judgments by each subject under each condition, a total of 110 judgments in 



U.S, Subjects . Sixteen female undergraduate paid volunteers were divid- 
ed Into two groups of eight. The procedure duplicated that followed with the 
Swedish subjects, except that the U.S. subjects listened to the tests over 
earphones In Individual booths. The output of the phones was adjusted by 
means of a calibration tone to be approximately 75 db SPL. These subjects 
also made psychophysical judgments on the pulse-train stimuli; these were 
counterbalanced with the sine waves In the first two sessions before the 
speech stimuli had been heard. The procedure yielded a total of 80 judgments 
on each contour under each condition, 

RESULTS 

No systematic differences between groups due to the order in which they 
made their judgments were observed. Data are therefore presented for the com- 
bined groups throughout. Figures 2 and 4 display the Swedish data. Figures 3 
and 5, the U.S. data. In each figure the left column gives the linguistic, 
the middle column the speech psychophysical, and the right column the sine-wave 
data.o Percentages of question and statement judgments (linguistic) or of rise 
and fall judgments (speech psychophysical and sine-wave) are plotted against 
terminal glide, measured as rise (positive) or fall (negative) In Hz, from turn- 
ing point to end point. In Figures 2 and 3 parameters of the curves are fo 
values at peaks (S, H, L), displayed for the four turning-point fjj values from 
1 (top) to 4 (bottom). In Figures 4 and 5 parameters of the curves are 
values at turning points (1, 2, 3, 4) displayed for the three peak f(j values cf 
S(top), H (middle), and L (bottom). 

Linguistic Judgments 

Cross- Language Comparisons 

Before considering the acoustic variables controlling linguistic Judgments, 
wp will briefly compare Swedish and U.S. results. The main drift of the data 
Is very similar for the two groups. A broad description of preferred statement 
and question contours for both groups can be given. 

Statements . Figure 6 schematizes the most frequently preferred contours, 
those obtaining 90% or better agreement. For all these contours, except two 
(L13; H13, Swedish only), the final fg of- the terminal glide is the lowest fg 



Judgments of the modulated sine waves and pulse trains by U.S. subjects were 
essentially identical. Accordingly, only sine-wave data are presented here. 



Percentages of Question or Rise Responses (left-axis) 
and Statement or Fall Responses (right-axis) 
Plotted as Functions of Terminal Glide in Hz 
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Percentages of Question or Rise Responses (left-axis) 
and Statement or Fall Responses (right-axis) 
Plotted as Functions of Terminal Glide In Hz 
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Fig. 3 

Turning-point values are constant across rows and peak 
values are parameters of the curves. For American subjects. 
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Percentages of Question or Rise Responses (left-axis) 
and Statement or Fall Responses (right-axis) 
Plotted as Functions of Terminal Glide in Hz 
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Peak values are constant across rows and turning points 
are parameters of the curves. For Swedish subjects. 
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Percentages of Question or Rise Responses (left-axis) 
and Statement or Fall Responses (right-axis) 
Plotted as Functions of Terminal Glide In Hz 
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Peak values are constant across rows and turning points ^ 

are parameters of the curves. For American subjects. 
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Schemata of Preferred Statement and Question Contours 
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Included are all contours for which at least 90% of the judgments 
of a given language group were in a single category. 
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of the utterance. In addition, the contours display at least one of the fol- 
lowing: terminal fall, low or middle turning point (1, 2, 3), low or high peak 

(L, H). The range of preferred contours includes the 2 33 14^ and 2 22 IV con- 
tours, suggested as typical by previous observations, but many others are e- 
qually acceptable. For example, the superhlgh peak, even when followed by a 
high (S4, US only) or moderately high (S3) turning point. Is accepted as a 
statement provided the terminal fall is large enough; the lower the turning 
point (l.e., the larger the fall from the peak), the less the needed terminal 
fall (see S series. Figures 4 and 5). On the other hand, some terminally 
level contours (H23, H12, L23, L12) and even terminally rising contours (H13, 
Swedish only; L13) are also accepted as statements. Evidently the terminal 
fall Is not essential, if preceding sections of the contour are low enough 
(L) or are falling from a moderate level (H). 



Broadly, then, peak, turning point, and terminal glide engage in trading 
relations such that the contour of an acceptable statement has a low to high 
(rarely, and for US only, superhlgh) peak and Is, over some portion of Its 
later course, low, falling, or both. (Two anomalous series, H4 and L4, are 
discussed below under Swedlsh-U.S. differences.) 

Questions . Figure 6 also schematizes contours obtaining 90% or better 
agreement on a question judgment. For all these contours, the terminal glide 
Is rising and the final pitch of the glide is the highest of the utterance 
(cf. Uldall, 1962, p. 780; Majewskl and Blasdell, 1969). The range of pre- 
ferred contours Includes the expected continuously rising 2 22 3t^ (L36, L46) 
and 2 33 3f^ (H46) of American English and the superhlgh peak contour, 2 44 24^ 
(S26) of Swedish, but other contours are also accepted. For example, initially 
low and falling contours (LI, L2) are heard as questions if the terminal rise 
is large enough. At the same time, even a terminally level contour (L45, Fig- 
ures 2-5) gathers more than 80% question Judgments from both groups, when the 
preceding section of the contour has been steadily rlsln;;. In fact, this 
steady rise Is a peculiarly powerful question cue that may quite override a 
large terminal fall that would otherwise cue a statement (cf. H4, L4, dis- 
cussed below). Again there are trading relations among components of the con- 
tour, such that a generally accepted t question displays either a rise from peak 
to turning point (H4, L3, L4) and a relatively small terminal rise, or a fall 
from peak to turning point and a relatively large terminal rise. 

Swedlsh-U.S. differences . As we have seen, the similarities between 
Swedish and U.S. Judgments are more striking than the differences. The stim- 
ulus series included a number of contours presumably unfamiliar to one or 
other or both groups from their linguistic experience. Yet both groups were 
able to generalize such contours with more familiar patterns, classifying con- 
tours with a relatively high overall pitch as questions, contours with a re- 
latively low overall pitch as statements. Nonetheless, small systematic dif- 
ferences are present. 

(1) A comparison of Swedish and U.S. responses to the falling contours 
of the S2, S3, S4 series (Figures 4 and 5, top left) shows that U.S. subjects 
tended to give more statement responses than Swedish subjects. The effect Is 
P®^^^cularly marked for the S4 series on which Swedish statement Judgments 
never reach 90% agreement: a high peak with a high turning point is difficult 

for Swedish subjects to hear as a statement. This may reflect the fact that 
Swedish statement Intonation shows an earlier fall to a low level after stress 



than does English. At the same time, It may be taken as an Indirect reflec- 
tion of a Swedish preference for an overall high contour on questions, so 
that utterances displaying such a contour are difficult to hear as statements 
even when completed by a low terminal fall. It Is true th^t the S4 series, 
which had been expected to collect a large number of question responses due 
to Its overall high level, never obtained 90% agreement on a question judg- 
ment from either group. But a control of these Items revealed that they gave 
an impression of protest or Indignation rather than of questioning, probably 
because the low precontour was heard in opposition to the rest of the utter- 
ance. A precontour on level 3 might have eliminated this impression and 
would also have been more similar to what actually occurs In Swedish questions. 

(2) As was remarked above, the continuously rising contours (L4 and, to 
some extent, L3 and H4; see Figures 2 and 3, lower left) were readily accepted 
by both groups as questions, despite the fact that many of them are unlikely 
to occur in natural speech. L4, with Its low peak rising 50 Hz to the turn- 
ing point, and H4, with Its high peak rising 20 Hz, were preferred to L3 with 
Its low peak rising only 15 Hz. Furthermore, H4 and, especially, L4' elicited 
relatlviily few statement responses, even when their terminal glides were fall- 
ing sharply. U.S. subjects Identified these contours as statements even less 
frequently than the Swedish group. This may reflect the fact that the stead- 
ily rising question contour is more widely used In American English than In 
Swedish and so might be peculiarly difficult for Americans to hear as a state- 
ment even when completed by a terminal fall. 

In short, the differences between the two groups are small but In direc- 
tions predictable from linguistic analysis. 

Variables Controlling Linguistic Judgments 

Terminal glide Is the single most powerful determinant of linguistic judg- 
ments. None of the highly preferred question contours and few of the hlghlj^ 
preferred statement contours (Figure 6) lack the appropriate terminal rise or 
fall. Given a sufficiently extensive terminal glide, earlier sections of the 
contour have small Importance. At the same time. Figures 2-5 show that 
values at peak and turning point may also play a role. 

To provide a consistent criterion for the estimate of peak and turning- 
point effects, the median of the response distribution for each subject on 
each series was estimated. The median is the point of subjective equality, 
the value of the terminal glide at which subjects Identify a given contour as 
a question or a statement 50% of the time. In other words. It Is the point of 
crossover from largely statement to largely question judgments. The means of 
these medians, or crossover values, for the linguistic judgments are plotted 
in Figure 7 (row A) for Swedish subjects (left) and U.S. subjects (right). 



We. should probably have Included a higher precontour, on level 3, to cover 
the question contours properly, since the large rise to the highest peak 
(from level 2 to level 4) gave some contours an unwMted and perhaps dominat- 
ing effect of protest rather than question (cf. footnote 5.) However, this 
would have meant a substantial Increase in an already lengthy test. 



166 






O 



MEAN ■EWAN TERMINAL RISEM OR FAllH IN Ht 






Mean Subject Medians Under the Three Experimental Conditions 
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In the first and third columns, mean medians are plotted 
as functions of peak f^, with turning-point f^ as parameter; 
In the second and fourth columns , they are plotted as 
functions of turning-point ^q, with peak f as parameter 
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In the first and third plots mean medians are graphed as functions of peak 
fo» with turning-point fg as parameter; In the second and fourth, they are 
graphed as functions of turning-point f^, with peak f^ as parameter. 

Two cautions should be observed In studying these plots. First, It 
should be reme m bered that a median is a single value drawn from the center 
of its distribution. The relation between the medians of two distributions 
does not always accurately represent the relations between the upper and low- 
er tails of those distributions. As long as two curves on any plot of Figures 
2 to 5 are roughly parallel, the difference between their medians will give a 
reasonable estimate of their separation along the terminal glide axis. Where 
there are severe departures from the parallel, the appropriate plots of Fig- 
ure 7 and of Figures 2 to 5 should be carefully read In conjunction. Second, 
It should be re mem bered that the mean of the medians of several distributions 
Is not necessarily equal to the median of the combined distribution. Since 
the values of Figure 7 are the means of subject medians, they do not always 
agree exactly with the group median values read from Figures 2 to 5. 

With these precautions In mind we return to row A of Figure 7. If the 
direction of the terminal glide were the sole determinant of linguistic judg- 
ments, we would expect all crossover values to fall at zero, the level of the 
dashed horizontal lines across Figure 7. Ir. fact, crossover values deviate 
considerably from zero: both the direction and the extent of their deviation 
vary with peak and turning point. 

The peak effect (plots 1 and 3) Is the smaller. For neither Swedish nor 
U.S. subjects does a change of peak f^ from 130 Hz to 160 Hz (from L to H) 
have any consistent, significant effect. But a change from 160 Hz to 200 Hz 
(from H to S). does reliably reduce the crossover value for all contours, ex- 
cept that having a turning point at 180 Hz for the U.S. group. (This reversal 
is probably not reliable, as study of the bottom left plot of Figure 3 will 
suggest.) These effects are statistically significant' by matched pair t-tests 
between medians for turning points 1, 2, and 3 In both groups (p«.05). They 
may bo clearly seen In the left columns of Figures 2 and 3. Reading down the 
columns we note the leftward separation of the S curves. The separation Is 
reduced for turning point 3 and gives place to the L curve, with Its steadily 
rising contour, for turning point 4. We may also note that, as the terminal 
rise Increases, the peak effect in the upper three plots disappears. In short, 
if the turning point Is at a low to middle fo and the terminal rise Is slight, 
a very high (level 4) peak at the stress leads to a significant Increase In 
the number of quest.lons heard and, by corollary, to a significant decrease In 
the number of statements. 

The turning-point effect (plots 2 and 4 of Figure 7) is both larger and 
more consistent than the peak effect. For all values of peak f^, an increase 
In turning-point f^ Is associated with a decrease Jn crossover value. The 
decrease Is significant by matched pair t-tests between medians (p<.05) for 
all turning-point shifts, except those from 100 to 120 Hz for the Swedish S, 

H, and L curves and for the U.S. S and H curves. The effect Is also consider- 
ably reduced, if the contour has a peak at 200 Hz (S). (See top left plots of 
Figures 4 and 5. ) This again suggests that the high peak alone Is a powerful 
question cue for both language groups. 



Psychophysical Judgments 














Speech Waves 

Psychophysical judgments of the speech-wave terminal glides differ from 
and resemble linguistic judgments of the. entire utterance In Important ways. 

The main difference may be seen In the center columns of Figures 2 and 3: 
the effect of the high peak is absent from the Swedish data and much reduced 
In the U.S. data. The main similarity may be seen in the center columns of 
Figures 4 and 5: the turning-point effect is present and even more pronounced 
than in the linguistic judgments. 

Figure 7 (row B) summarizes the data. The peak effects (plots 1 and 3) 
are inconsistent. An increase in peak f^ from 130 Hz (L) to 160 Hz (H) yields 
in every instance, except the high turning-point series for Swedish subjects, 
an increase rather than a decrease in the crossover value of the terminal rise. 
Two of these Increases (for turning points 1 and 2) are significant for both 
groups (p<.05 by a matched pair t-test between medians). On the other hand, an 
increase of peak f^ from 160 Hz to 200 Hz yields, for the Swedish subjects, two 
increases and two decreases, none of them significant. The absence of a con- 
sistent peak effect for the Swedish subjects is evident in the middle column 
of Figure 2. For the U.S. subjects, the picture is somewhat different: cross- 
over values decrease from H to S for turning points 1, 2, and 3 and Increase for 
turning point 4, exactly as In the linguistic data. The effects are reduced 
and statistically significant only for turning point 2. But a trend is present 
and quite evident in the middle column of Figure 3. 

The turning-point effect, on the other hand (center columns of Figures 4 
and 5; plots 2 and 4, row B of Figure 7) is similar to and even more pronounced 
than the corresponding effect in the semantic data. All shifts are signifi- 
cant by matched pair t-tests (p<.05), except that from turning point 1 to 2 in 
the Swedish L series. For both groups, the higher the turning point, the small- 
er the terminal rise needed for a rise to be consistently heard. The similari- 
ty to the linguistic results is most marked for the H and L series (second and 
third rows. Figures 4 and 5): H4 and L4 are again ancmalous series, readily 
heard as rising even when the terminal glide is falling. In the S series the 
turning-point effect is even more pronounced than for the linguistic judgments. 

Sine Waves 



From the steepened functions of Figures 2 to 5 (right-hand columns) it 
is evident that subjects were in better agre^ent on their sine-wave than on 
their speech psychophysical or linguistic judgments. The two language groups 
are also In close agreement, which gives some confidence that the differences 
between their linguistic judgments are reliable. 

• 

Figures 2 and 3 (right-hand columns) show that the effect of the high 
peak is absent. As in the speech psychophysical data, low peak contours tend 
to be the most accurately judged, particularly by the Swedish. But the effects 
are neither fully consistent nor statistically significant (see plots 1 and 3, 
row C, Figure 7). 

On the other hand, the turning-point effects (plots 2 and 4, row C, Fig- 
ure 7) are c.lear, similar to those observed in the linguistic and speech 
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psychophysical data but considerably reduced. The effects are significant by 
matched pair t-tests (p<.05) for all turning-point shifts, except those from 
100 Hz to 120 Hz for the S and H curves in both groups, and may be seen in the 
right-hand columns of Figures 4 and 5. Note that H4 and L4 are no longer anom- 
alous series. 




DISCUSSION 

Cross-language comparisons . There are striking similarities between 
Swedish and U.S. judgments of these intonation contours. Despite small, lin- 
guistically predictable differences, both groups tend to classify contours 
with a high peak or terminal rise as questions, contours with a low peak or 
terminal fall as statements. Hermann (1942) has pointed out the generality 
across languages, including Swedish, of a high pitch for questions (see also 
Hadding-Koch, 1961, especially pp. 119 ff.). Bolinger (1964), among others, 
has discussed the apparently "universal tendency" to use a raised tone to indi- 
cate points of "interest" within utterances and also to indicate that more is 
to follow, as in questions (cf. Hadding-Koch, 1965). The data of this experi- 
ment are consistent with these "universal tendencies." 

Perceptual relations within a contour . We are now in a position to re- 
solve some of the uncertainties left by our previous study. Consider, first, 
the turning-point effect. Since this is present and significant under all 
three experimental conditions, we must assign it auditory status and assume 
that it takes linguistic effect indirectly by altering subjects’ perceptions 
of the terminal glide. Furthermore, since it is present, even though reduced, 
in the sine-wave data, our account of the process by which it affects per- 
ception of the terminal glide cannot invoke specialized mechanisms peculiar 
to speech. 

We may gather some idea of the process from a study of plots 2 and 4 in 
row B, Figure 7 or of the center plots in Figures 4 and 5. The terminal glide 
of a contour, such as HI, with a strong fall from peak to turning point (160 Hz 
to 100 Hz) requires a terminal rise of about 50 Hz if it is to be Judged 50% 
of the time as rising; while the terminal glide of a contour, such as H4, with 
a steady rise for more than 200 msec before the terminal glide, is heard as 
rising 50% of the time, even when the glide is falling by about 50 Hz. Evi- 
dently listeners have difficulty in separating the terminal glide from earlier 
sections of the contour, if those earlier sections have a marked movement. The 
terminal glides of contours with a turning point (145 Hz in S3, H3, L3) close 
to the precontour level of 130 Hz are more accurately perceived: the median 
values are close to zero in every plot of Figure 7, columns 2 and 4. Listen- 
ers are perhaps able to average across earlier sections of such contours and 
establish an anchor against which terminal glide may be judged. 

All this implies that later sections of the contours in this study (that 
is, roughly the last 400 msec, from peak to turning point to end point) were 
processed by listeners as a single unit, with attention focussed on the ter- 
minal glide. If a listener was able to separate the glide perceptually from 
the immediately preceding section (as in the S3, H3, L3 series), his linguis- 
tic judgments followed pretty well the traditional formulation of rise for 
questions, fall for statements. If he was not able to separate the glide, due 
to the difficulty — heightened perhaps for a complex speech signal--of tracking 
a rapidly modulated frequency, relatively gross movements of the terminal glide 
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were necessary for him to be sure whether he had heard a rise or a fall, a 
question or a statement. 

Interpretation of the peak effect Is more difficult. In our earlier 
study, the effect was clear In both linguistic and psychophysical judgments 
of both groups, though the Swedish were less consistent In their psychophysi- 
cal judgments than the Americans. In this study, a peak effect Is signifi- 
cantly present In linguistic judgments, totally absent from sine-wave judg- 
ments, and for speech psychophysical judgments, marginally present only for 
the Americans. 

We will consider the speech psychophysical data below. Here, the im- 
portant point Is that the peak effect Is reliably present In the linguistic, 
but absent from the sine-wave, judgments. We may therefore, with reasonable 
certainty, reject an auditory (or psychophysical) account and assign a direct 
linguistic function to the peak. Unlike turning-point variations, peak 
variations do not take linguistic effect by altering listeners' percep- 
tions of the terminal glide. Rather, the peak Is a distinct element to 
be weighed with the perceived terminal glide In determining the linguistic 
outcome. 

We should note. In caution, that peak and terminal glide are not always 
simply additive In their effects. For example, a contour with a steady rise 
from precontour to end point may require a relatively small terminal rise to 
be heard as a question, despite Its low peak (e.g., L3 series). Here, it 
seems to be the overall sweep of the pattern that determines the judgment 
rather than the frequency levels of particular segments of the contour. 

However, with few exceptions, two factors would seem to govern linguis- 
tic judgments of Intonation contours, such as those of this study? fundamen- 
tal frequency at the peak and perceived terminal glide. The entire contour 
Is then Interpreted as a unit with these factors In weighted combination, and 
with the heavier weight being assigned to the terminal glide. If a terminal 
fall Is heard, the listener Interprets the utterance as a statement, unless 
the fall was slight and he has also heard a very high peak; If a terminal rise 
Is heard, the listener Interprets the utterance as a question, unless the rise 
was slight and he has also heard an unusually low peak (cf . Greenberg, 1969, 

Ch. 2; Ohala, 1970, pp. 101 ff.). 

Auditory- linguistic Interactions . We turn, finally, to the speech psy- 
chophysical data. Our problem is to understand the Instances In which speech 
psychophysical judgments follow the linguistic more closely than the sine- 
wave judgments. . Obviously, these Instances can only occur where linguistic 
judgments of the entire contour differ from auditory judgments of the terminal 
sine-wave glide, that is, where the contour carries some linguistically rele- 
vant cue other than terminal glide. For questions, such cues Include a super- 
high peak or a monotonlc rise from precontour to turning point. Accordingly 
we find a tendency for speech psychophysical judgments to follow linguistic 
judgments In the superhlgh (S) peak series (see Figure 3) and in the high turn- 
ing-point series (see Figures 4 and 5). Consider, particularly, the results 
for speech contours of the H4 and L4 series. Listeners in both groups often 
judge these contours both as questions and as terminally rising, even though 
they are able to hear that the corresponding sine-wave contours have terminal 

falls. Since listeners cannot imve judged the contours, to be questions 
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because they heard a terminal rise, we are tempted to conclude that they 
heard the terminal rise because they judged the contours to be questions: 
linguistic decision determined auditory shape. 

Before elaborating on this, It Is Important to remark that such effects 
do not always occur where they might be expected. For example, the peak ef- 
fect was clearly present In the speech psychophysical judgments of both groups 
In our earlier study but Is reduced to a marginal effect In the American and 
has disappeared entirely from the Swedish speech psychophysical data of the 
present study. We can hardly therefore call on the effect to support a gen- 
eral account In terms of some specialized perceptual mechanism, such as that 
proposed by Lleberman (1967). At the same time, the results are evidently 
peculiar to speech and cannot be handled In purely auditory terms. What we 
need, therefore. Is an account In terms of a process that may vary with ex- 
perimental conditions and subjects. 

An Interesting hypothesis, suggested above. Is that the results reflect 
the blend of serial and parallel processing that characterizes the perception 
of spoken language (and of other complex cognitive objects) (cf. Fry. 1956; 
Chlstovlch et al., 1968; Studder t-Kennedy , In press). We may conceive the 
perceptual process as divided Into stages (auditory, phonetic, phonological, 
etc.), but we must also suppose there to be feedback from higher to lower 
levels which may serve to correct or verify earlier decisions. Perceptual 
"correction" of an auditory or phonetic decision. In light of a higher lin- 
guistic decision, will presumably not occur If the lower decision Is flnn. 
Otherwise, we would not be able to deem the Intonation of an actor "wrong" 
or to understand a speaker, yet perceive his dialect to be unfamiliar. How- 
ever, In difficult listening conditions and under certain, as yet undefined, 
acoustic conditions, perceptual "correction," sufficient to produce a com- 
pelling phonetic illusion, may occur (Miller, 1956). Warren (1970; Warren 
and Obusek, 1971) has shown that listeners may clearly perceive a phonetic 
segment that has been excised from a recorded utterance and replaced by an 
extraneous sound (cough, buzz, tone) of the same duration. The Important 
point is that listeners perceive the correct segment: the precise form of the 
phonetic Illusion Is determined not by the acoustic conditions alone but also 
by higher-order linguistic constraints. 

Here, the illusion is auditory rather than phonetic, but a similar me- 
chanism may be at work. Asked to interrupt his normal perceptual process at 
a prephonetlc auditory stage, the listener falls bfck on his knowledge of the 
language. As we have seen, the- single most powerful cue for question/state- 
ment judgments In this experiment was the terminal glide. Listeners evident- 
ly prefer, and presumably expect, a question to end with a rise, a statement 
with a fall (see Figure 6). However, earlier sections of the contour may also 
enter into the decision and, if sufficiently marked, override an incompatible, 
but relatively weak, terminal glide. Called upon to judge this glide, the 
listener then assigns It a value consonant with his linguistic decision. That 
is to say, if other factors dominate his. linguistic decision, he may be led 
into nonverldlcal perception of the terminal glide. 

The degree to which this happens might be expected to vary with the re- 
lative strengths of the cues controlling linguistic decision. And in fact, 
just as the peak effect in the linguistic data was stronger for our first 
study than for our second, so too was, the peak effect in the speech psycho- 
physical data. Similarly, just as the question cue in the rising contours of 



the H4 and L4 series is stronger for the Americans than for the Swedish, so 
too Is the tendency toward nonverldlcal judgment of the terminal glide. 

However, we should not expect to be able to develop a fully coherent 
account of our results In these terms, since we are Ignorant of the limiting 
linguistic and acoustic conditions of the Illusion. We are currently plan- 
ning to broaden our understanding of the effect by taking advantage of what 
Is known about the various acoustic cues to word stress (Fry, 1955, 1958). 

We might expect, for example, that. If linguistic decision can Indeed deter- 
mine auditory shape, syllables of equal duration, judged to be differently 
stressed on the basis of differences In either Intensity or fundamental fre- 
quency, would also be judged of unequal length. The ultimate Interest of 
the account Is In Its suggestion that the auditory level Is not Independent 
of higher levels but Is an Integral part of the process by which we construct 
our perceptions of spoken language. 
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